Train Big, Plan Smart - How to Calculate Memory and Estimate GPUs for LLMs

Unlocking the Basics

Training large language models isn’t just a question of can you do it—it’s a question of how smartly you do it. If you’ve ever wondered how researchers train those massive AI models with billions of parameters, it all starts with smart planning. Behind every successful LLM training run is a... [Read More]
Tags: LLMTraining, Memory estimation, GPU sizing, Generative AI, Model Scaling

Weight Initialization - The First Principle

Journey from basics to advanced.

Weight Initialization is the most underrated concept in the deep learning terminology. I have seen many newbie deep learning practitioners and even some experienced ones ignoring this important concept. Unlike some already available tutorials or blogs, we will not talk about why you should not initialize your weights with all... [Read More]
Tags: weight, initialization, deep learning, xavier, kaiming, eigenvalue, eigenvector